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Networks have in recent years emerged as an invalu- 
able tool for describing and quantifying complex systems 
in many branches of science j2,H,H]. Recent studies sug- 
gest that networks often exhibit hierarchical organization, 
where vertices divide into groups that further subdivide 
into groups of groups, and so forth over multiple scales. 
In many cases these groups are found to correspond to 
known functional units, such as ecological niches in food 
webs, modules in biochemical networks (protein interac- 
tion networks, metabolic networks, or genetic regulatory 
networks), or communities in social networks fia, H>@]- 
Here we present a general technique for inferring hierar- 
chical structure from network data and demonstrate that 
the existence of hierarchy can simultaneously explain and 
quantitatively reproduce many commonly observed topo- 
logical properties of networks, such as right-skewed de- 
gree distributions, high clustering coefficients, and short 
path lengths. We further show that knowledge of hier- 
archical structure can be used to predict missing connec- 
tions in partially known networks with high accuracy, and 
for more general network structures than competing tech- 
niques 1 8]. Taken together, our results suggest that hierar- 
chy is a central organizing principle of complex networks, 
capable of offering insight into many network phenom- 
ena. 

A great deal of recent work has been devoted to the study 
of clustering and community structure in networks H 0. 

nana 

. Hierarchical structure goes beyond simple clustering, 
however, by explicitly including organization at all scales in 
a network simultaneously. Conventionally, hierarchical struc- 
ture is represented by a tree or dendrogram in which closely 
related pairs of vertices have lowest common ancestors that 
are lower in the tree than those of more distantly related 
pairs — see Fig. [T] We expect the probability of a connec- 
tion between two vertices to depend on their degree of relat- 
edness. Structure of this type can be modelled mathematically 
using a probabilistic approach in which we endow each inter- 
nal node r of the dendrogram with a probability p r and then 
connect each pair of vertices for whom r is the lowest com- 
mon ancestor independently with probability p r (Fig. |TJ. 

This model, which we call a hierarchical random graph, is 
similar in spirit (although different in realization) to the tree- 
based models used in some studies of network search and nav- 
igation ifT^L M~3h . Like most work on community structure, it 



assumes that communities at each level of organization are 
disjoint. Overlapping communities have occasionally been 
studied (see, for example [ 14]) and could be represented using 
a more elaborate probabilistic model, but as we discuss below 
the present model already captures many of the structural fea- 
tures of interest. 

Given a dendrogram and a set of probabilities p r , the hi- 
erarchical random graph model allows us to generate artifi- 
cial networks with a specified hierarchical structure, a proce- 
dure that might be useful in certain situations. Our goal here, 
however, is a different one. We would like to detect and ana- 
lyze the hierarchical structure, if any, of networks in the real 
world. We accomplish this by fitting the hierarchical model 
to observed network data using the tools of statistical infer- 
ence, combining a maximum likelihood approach lfl5ll with 
a Monte Carlo sampling algorithm 1U6I1 on the space of all 
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FIG. 1: A hierarchical network with structure on many scales and 
the corresponding hierarchical random graph. Each internal node r 
of the dendrogram is associated with a probability p r that a pair of 
vertices in the left and right subtrees of that node are connected. (The 
shades of the internal nodes in the figure represent the probabilities.) 
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T. pallidum 

Terrorists 

Grassland 


4.8 3.7(1) 

4.9 5.1(2) 
3.0 2.9(1) 


0.0625 0.0444(2) 
0.361 0.352(1) 
0.174 0.168(1) 


3.690 3.940(6) 
2.575 2.794(7) 
3.29 3.69(2) 



TABLE I: Comparison of network statistics for the three example 
networks studied and new networks generated by resampling from 
our hierarchical model. The generated networks closely match the 
average degree (fc), clustering coefficient C, and average vertex- 
vertex distance d in each case, suggesting that they capture much of 
the real networks' structure. Parenthetical values indicate standard 
errors on the final digits. 



possible dendrograms. This technique allows us to sample hi- 
erarchical random graphs with probability proportional to the 
likelihood that they generate the observed network. To obtain 
the results described below we combine information from a 
large number of such samples, each of which is a reasonably 
likely model of the data. 

The success of this approach relies on the flexible nature 
of our hierarchical model, which allows us to fit a wide range 
of network structures. The traditional picture of communities 
or modules in a network, for example, corresponds to con- 
nections that are dense within groups of vertices and sparse 
between them — a behaviour called "assortativity" in the lit- 
erature [1?]. The hierarchical random graph can capture be- 
haviour of this kind using probabilities p r that decrease as we 
move higher up the tree. Conversely, probabilities that in- 
crease as we move up the tree correspond to "disassortative" 
structures in which vertices are less likely to be connected 
on small scales than on large ones. By letting the p r values 
vary arbitrarily throughout the dendrogram, the hierarchical 
random graph can capture both assortative and disassortative 
structure, as well as arbitrary mixtures of the two, at all scales 
and in all parts of the network. 

To demonstrate our method we have used it to construct hi- 
erarchical decompositions of three example networks drawn 
from disparate fields: the metabolic network of the spirochete 
Treponema pallidum [18], a network of associations between 
terrorists lfl9ll . and a food web of grassland species lr20ll . To 
test whether these decompositions accurately capture the net- 
works' important structural features, we use the sampled den- 
drograms to generate new networks, different in detail from 
the originals but, by definition, having similar hierarchical 
structure (see the Supplementary Information for more de- 
tails). We find that these "resampled" networks match the 
statistical properties of the originals closely, including then- 
degree distributions, clustering coefficients, and distributions 
of shortest path lengths between pairs of vertices, despite the 
fact that none of these properties is explicitly represented in 
the hierarchical random graph (Table [I] and Fig. [S3] in the 
Supplementary Information). Thus it appears that a network's 
hierarchical structure is capable of explaining a wide variety 
of other network features as well. 

The dendrograms produced by our method are also of inter- 
est in themselves, as a graphical representation and summary 
of the hierarchical structure of the observed network. As dis- 
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FIG. 2: Application of the hierarchical decomposition to the net- 
work of grassland species interactions, a, Consensus dendrogram 
reconstructed from the sampled hierarchical models, b, A visualiza- 
tion of the network in which the upper few levels of the consensus 
dendrogram are shown as boxes around species (plants, herbivores, 
parasitoids, hyper-parasitoids and hyper-hyper-parasitoids are shown 
as circles, boxes, down triangles, up triangles and diamonds respec- 
tively). Note that in several cases, a set of parasitoids is grouped into 
a disassortative community by the algorithm, not because they prey 
on each other, but because they prey on the same herbivore. 



cussed above, our method can generates not just a single den- 
drogram but a set of dendrograms, each of which is a good fit 
to the data. From this set we can, using techniques from phy- 
logeny reconstruction 112 ill , create a single consensus dendro- 
gram, which captures the topological features that appear con- 
sistently across all or a large fraction of the dendrograms and 
typically represents a better summary of the network's struc- 
ture than any individual dendrogram. Figure [2^ shows such 
a consensus dendrogram for the grassland species network, 
which clearly reveals communities and sub-communities of 
plants, herbivores, parasitoids, and hyper-parasitoids. 

Another application of the hierarchical decomposition is 
the prediction of missing interactions in networks. In many 
settings, the discovery of interactions in a network requires 
significant experimental effort in the laboratory or the field. 
As a result, our current pictures of many networks are sub- 
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stantially incomplete 12J, |2J, |M 123 IM 121 12s] • An attrac- 
tive alternative to checking exhaustively for a connection be- 
tween every pair of vertices in a network is to try to predict, 
in advance and based on the connections already observed, 
which vertices are most likely to be connected, so that scarce 
experimental resources can be focused on testing for those in- 
teractions. If our predictions are good, we can in this way 
reduce substantially the effort required to establish the net- 
work's topology. 

The hierarchical decomposition can be used as the basis for 
an effective method of predicting missing interactions as fol- 
lows. Given an observed but incomplete network, we gener- 
ate as described above a set of hierarchical random graphs — 
dendrograms and the associated probabilities p r — that fit that 
network. Then we look for pairs of vertices that have a high 
average probability of connection within these hierarchical 
random graphs but which are unconnected in the observed net- 
work. These pairs we consider the most likely candidates for 
missing connections. (Technical details of the procedure are 
given in the Supplementary Information.) 

We demonstrate the method using our three example net- 
works again. For each network we remove a subset of con- 
nections chosen uniformly at random and then attempt to pre- 
dict, based on the remaining connections, which ones have 
been removed. A standard metric for quantifying the accu- 
racy of prediction algorithms, commonly used in the medi- 
cal and machine learning communities, is the AUC statistic, 
which is equivalent to the area under the receiver-operating 
characteristic (ROC) curve 112911 . In the present context, the 
AUC statistic can be interpreted as the probability that a ran- 
domly chosen missing connection (a true positive) is given a 
higher score by our method than a randomly chosen pair of 
unconnected vertices (a true negative). Thus, the degree to 
which the AUC exceeds 1/2 indicates how much better our 
predictions are than chance. Figure [3] shows the AUC statis- 
tic for the three networks as a function of the fraction of the 
connections known to the algorithm. For all three networks 
our algorithm does far better than chance, indicating that hi- 
erarchy is a strong general predictor of missing structure. It 
is also instructive to compare the performance of our method 
to that of other methods for link prediction [8]. Previously 
proposed methods include assuming that vertices are likely to 
be connected if they have many common neighbours, if there 
are short paths between them, or if the product of their de- 
grees is large. These approaches work well for strongly as- 
sortative networks such as the collaboration and citation net- 
works [8] and for the metabolic and terrorist networks stud- 
ied here (Fig. |3^,b). Indeed, for the metabolic network the 
shortest-path heuristic performs better than our algorithm. 

However, these simple methods can be misleading for net- 
works that exhibit more general types of structure. In food 
webs, for instance, pairs of predators often share prey species, 
but rarely prey on each other. In such situations a common- 
neighbour or shortest-path-based method would predict con- 
nections between predators where none exist. The hierarchi- 
cal model, by contrast, is capable of expressing both assorta- 
tive and disassortative structure and, as Fig. [3J; shows, gives 
substantially better predictions for the grassland network. (In- 
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FIG. 3: Comparison of link prediction methods. Average AUC statis- 
tic, i.e., the probability of ranking a true positive over a true negative, 
as a function of the fraction of connections known to the algorithm, 
for the link prediction method presented here and a variety of previ- 
ously published methods. 



deed, in Fig. |2j) there are several groups of parasitoids that 
our algorithm has grouped together in a disassortative com- 
munity, in which they prey on the same herbivore but not on 
each other.) The hierarchical method thus makes accurate pre- 
dictions for a wider range of network structures than the pre- 
vious methods. 

In the applications above, we have assumed for simplicity 
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that there are no false positives in our network data, i.e., that 
every observed edge corresponds to a real interaction. In net- 
works where false positives may be present, however, they too 
could be predicted using the same approach: we would simply 
look for pairs of vertices that have a low average probability 
of connection within the hierarchical random graph but which 
are connected in the observed network. 

The method described here could also be extended to incor- 
porate domain-specific information, such as species morpho- 
logical or behavioural traits for food webs [28] or phyloge- 
netic or binding-domain data for biochemical networks 112311 . 
by adjusting the probabilities of edges accordingly. As the 
results above show, however, we can obtain good predictions 
even in the absence of such information, indicating that topol- 
ogy alone can provide rich insights. 

In closing, we note that our approach differs crucially from 
previous work on hierarchical structure in networks HHHHH^I, 
in that it acknowledges explicitly that most real- 
world networks have many plausible hierarchical representa- 



tions of roughly equal likelihood. Previous work, by contrast, 
has typically sought a single hierarchical representation for a 
given network. By sampling an ensemble of dendrograms, our 
approach avoids over-fitting the data and allows us to explain 
many common topological features, generate resampled net- 
works with similar structure to the original, derive a clear and 
concise summary of a network's structure via its consensus 
dendrogram, and accurately predict missing connections in a 
wide variety of situations. 
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SUPPLEMENTARY INFORMATION 



APPENDIX A: HIERARCHICAL RANDOM GRAPHS 




Our model for the hierarchical organization of a network 
is as follows. 1 Let G be a graph with n vertices. A dendro- 
gram D is a binary tree with n leaves corresponding to the ver- 
tices of G. Each of the n — 1 internal nodes of D corresponds 
to the group of vertices that are descended from it. We asso- 
ciate a probability p r with each internal node r. Then, given 
two vertices i,j of G, the probability pij that they are con- 
nected by an edge is pij = p r where r is their lowest common 
ancestor in D. The combination (D, {p r }) of the dendrogram 
and the set of probabilities then defines a hierarchical random 
graph. 

Note that if a community has, say, three subcommunities, 
with an equal probability p of connections between them, we 
can represent this in our model by first splitting one of these 
subcommunities off, and then splitting the other two. The two 
internal nodes corresponding to these splits would be given 
the same probabilities p r = p. This yields three possible bi- 
nary dendrograms, which are all considered equally likely. 

We can think of the hierarchical random graph as a varia- 
tion on the classical Erdos-Renyi random graph G(n,p). As 
in that model, the presence or absence of an edge between 
any pair of vertices is independent of the presence or absence 
of any other edge. However, whereas in G(n,p) every pair 
of vertices has the same probability p of being connected, in 
the hierarchical random graph the probabilities are inhomo- 
geneous, with the inhomogeneities controlled by the topolog- 
ical structure of the dendrogram D and the parameters {p r }. 
Many other models with inhomogeneous edge probabilities 
have, of course, been studied in the past. One example is a 
structured random graph in which there are a finite number 
of types of vertices with a matrix p^i giving the connection 
probabilities between them. 2 



APPENDIX B: FITTING THE HIERARCHICAL RANDOM 
GRAPH TO DATA 
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FIG. SI: An example network G consisting of six vertices, and the 
likelihood of two possible dendrograms. The internal nodes r of 
each dendrogram are labeled with the maximum-likelihood proba- 
bility p r , i.e., the fraction of potential edges between their left and 
right subtrees that exist in G. According to Eq. JB3b . the likelihoods 
ofthe two dendrograms are £(Di) = (l/3)(2/3) 2 ■ (l/4) 2 (3/4) 6 = 
0.00165 ... and C(D 2 ) = (l/9)(8/9) 8 = 0.0433 ... The second 
dendrogram is far more likely because it correctly divides the net- 
work into two highly-connected subgraphs at the first level. 



sample the space of all models with probability proportional 
toC. 

Let E r be the number of edges in G whose endpoints have r 
as their lowest common ancestor in D, and let L r and R r , 
respectively, be the numbers of leaves in the left and right 
subtrees rooted at r. Then the likelihood of the hierarchical 
random graph is 



c(D,{ Pr })= n^(i- Pr ) 



L r R r -E r 



(Bl) 



reD 



with the convention that 0° = 1 . 

If we fix the dendrogram D, it is easy to find the proba- 
bilities {p r } that maximize C(D, {p r })- For each r, they are 
given by 



Pr = 



E I 



(B2) 



Now we turn to the question of finding the hierarchi- 
cal random graph or graphs that best fits the observed real- 
world network G. Assuming that all hierarchical random 
graphs are a priori equally likely, the probability that a given 
model (D, {p r }) is the correct explanation of the data is, by 
Bayes' theorem, proportional to the posterior probability or 
likelihood C with which that model generates the observed 
network. 3 Our goal is to maximize C or, more generally, to 



1 Computer code implementing many of the analysis methods described in 
this paper can be found online at 

www . santaf e . edu/~aaronc/ randomgraphs/. 

2 F. McSherry, "Spectral Partitioning of Random Graphs." Proc. Founda- 
tions of Computer Science (FOCS), pp. 529-537 (2001) 

3 G. Casella and R. L. Berger, "Statistical Inference." Duxbury Press, Bel- 
mont (2001). 



the fraction of potential edges between the two subtrees of r 
that actually appear in the graph G. The likelihood of the 
dendrogram evaluated at this maximum is then 



1-Pr- 



rGD 



L r R r 



(B3) 



Figure [ST] shows an illustrative example, consisting of a net- 
work with six vertices. 

It is often convenient to work with the logarithm of the like- 
lihood, 



log C(D) = -J2L r R r h(p r ), 



(B4) 



where h(p) — — plogp — (1 — p)log(l — p) is the Gibbs- 
Shannon entropy function. Note that each term —L r R r h(p r ) 
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FIG. S2: Each internal node r of the dendrogram has three associated 
subtrees s, t, and u, which can be placed in any of three configura- 
tions. (Note that the topology of the dendrogram depends only on 
the sibling and parent relationships; the order, left to right, in which 
they are depicted is irrelevant). 



steps. This is not a rigorous performance guarantee, however, 
and indeed there are mathematical results for similar Markov 
chains that suggest that equilibration could take exponential 
time in the worst case. 5 Still, as our results here show, the 
method seems to work quite well in practice. The algorithm 
is able to handle networks with up to a few thousand vertices 
in a reasonable amount of computer time. 

We find that there are typically many dendrograms with 
roughly equal likelihoods, which reinforces our contention 
that it is important to sample the distribution of dendrograms 
rather than merely focusing on the most likely one. 



is maximized when p r is close to or to 1, i.e., when the en- 
tropy is minimized. In other words, high-likelihood dendro- 
grams are those that partition the vertices into groups between 
which connections are either very common or very rare. 

We now use a Markov chain Monte Carlo method to sam- 
ple dendrograms D with probability proportional to their like- 
lihood C(D). To create the Markov chain we need to pick a 
set of transitions between possible dendrograms. The tran- 
sitions we use consist of rearrangements of subtrees of the 
dendrogram as follows. First, note that each internal node r 
of a dendrogram D is associated with three subtrees: the sub- 
trees s, t descended from its two daughters, and the subtree u 
descended from its sibling. As Figure[S2]shows, there are two 
ways we can reorder these subtrees without disturbing any of 
their internal relationships. Each step of our Markov chain 
consists first of choosing an internal node r uniformly at ran- 
dom (other than the root) and then choosing uniformly at ran- 
dom between the two alternate configurations of the subtrees 
associated with that node and adopting that configuration. The 
result is a new dendrogram D'. It is straightforward to show 
that transitions of this type are ergodic, i.e., that any pair of 
finite dendrograms can be connected by a finite series of such 
transitions. 

Once we have generated our new dendrogram D' we ac- 
cept or reject that dendrogram according to the standard 
Metropolis-Hastings rule. 4 Specifically, we accept the transi- 
tion D -> D' if A log C = log £(D') - \og£{D) is nonneg- 
ative, so that D' is at least as likely as D; otherwise we accept 
the transition with probability exp(log A£) = C(D')/C(D). 
If the transition is not accepted, the dendrogram remains the 
same on this step of the chain. The Metropolis-Hastings rule 
ensures detailed balance and, in combination with the ergod- 
icity of the transitions, guarantees a limiting probability dis- 
tribution over dendrograms that is proportional to the likeli- 
hood, P{D) oc C(D). The quantity A log C can be calculated 
easily, since the only terms in Eq. dB4b that change from D 
to D' are those involving the subtrees s, t, and u associated 
with the chosen node. 

The Markov chain appears to converge relatively quickly, 
with the likelihood reaching a plateau after roughly 0(n 2 ) 



4 M. E. J. Newman and G. T. Barkema, "Monte Carlo Methods in Statistical 
Physics." Clarendon Press, Oxford (1999). 



APPENDIX C: RESAMPLING FROM THE 
HIERARCHICAL RANDOM GRAPH 

The procedure for resampling from the hierarchical random 
graph is as follows. 

1 . Initialize the Markov chain by choosing a random start- 
ing dendrogram. 

2. Run the Monte Carlo algorithm until equilibrium is 
reached. 

3. Sample dendrograms at regular intervals thereafter from 
those generated by the Markov chain. 

4. For each sampled dendrogram D, create a resampled 
graph G with n vertices by placing an edge between 
each of the n(n — l)/2 vertex pairs with inde- 
pendent probability = p r , where r is the lowest 
common ancestor of i and j in D and p r is given by 
Eq. SB2\ . (In principle, there is nothing to prevent us 
from generating many resampled graphs from a den- 
drogram, but in the calculations described in this paper 
we generate only one from each dendrogram.) 

After generating many samples in this way, we can compute 
averages of network statistics such as the degree distribution, 
the clustering coefficient, the vertex-vertex distance distribu- 
tion, and so forth. Thus, in a way similar to Bayesian model 
averaging, 6 we can estimate the distribution of network statis- 
tics defined by the equilibrium ensemble of dendrograms. 

For the construction of consensus dendrograms such as the 
one shown in Fig. 2a, we found it useful to weight the most 
likely dendrograms more heavily, giving them weight propor- 
tional to the square of their likelihood, in order to extract a 
coherent consensus structure from the equilibrium set of mod- 
els. 



5 E. Mossel and E. Vigoda, "Phylogenetic MCMC Are Misleading on Mix- 
tures of Trees." Science 309, 2207 (2005) 

6 T. Hastie, R. Tibshirani and J. Friedman, "The Elements of Statistical 
Learning." Springer, New York (2001). 



7 




Degree, k 




2 4 6 8 10 

Distance, d 



FIG. S3: Application of our hierarchical decomposition to the network of grassland species interactions, a, Original (blue) and resampled (red) 
degree distributions, b, Original and resampled distributions of vertex-vertex distances. 



APPENDIX D: PREDICTING MISSING CONNECTIONS 

Our algorithm for using hierarchical random graphs to pre- 
dict missing connections is as follows. 

1 . Initialize the Markov chain by choosing a random start- 
ing dendrogram. 

2. Run the Monte Carlo algorithm until equilibrium is 
reached. 

3. Sample dendrograms at regular intervals thereafter from 
those generated by the Markov chain. 

4. For each pair of vertices i, j for which there is not al- 
ready a known connection, calculate the mean probabil- 
ity {pij } that they are connected by averaging over the 
corresponding probabilities pij in each of the sampled 
dendrograms D. 

5 . Sort these pairs i , j in decreasing order of (p^ ) and pre- 
dict that the highest-ranked ones have missing connec- 
tions. 

In general, we find that the top 1 % of such predictions are 
highly accurate. However, for large networks, even the top 1 % 
can be an unreasonably large number of candidates to check 
experimentally. In many contexts, researchers may want to 
consider using the procedure interactively, i.e., predicting a 
small number of missing connections, checking them exper- 
imentally, adding the results to the network, and running the 
algorithm again to predict additional connections. 

The alternative prediction methods we compared against, 
which were previously investigated in 7 , consist of giving each 
pair i, j of vertices a score, sorting pairs in decreasing order of 



7 D. Liben-Nowell and J. Kleinberg, "The link prediction problem for social 
networks." Proc. Internal. Conf. on Info, and Know. Manage. (2003). 



their score, and predicting that those with the highest scores 
are the most likely to be connected. Several different types of 
scores were investigated, defined as follows, where T(j) is the 
set of vertices connected to j. 

1. Common neighbors: score(i,j) = \T(i) n r(j)|, the 
number of common neighbors of vertices i and j. 

2. Jaccard coefficient: score(i, j) = |r(«) (~l 
r (i)l / U r(j)|, the fraction of all neighbors of i 
and j that are neighbors of both. 

3. Degree product: score(z, j) = \T(i)\ \T(j)\, the product 
of the degrees of i and j. 

4. Short paths: score(i, j) is 1 divided by the length of the 
shortest path through the network from i to j (or zero 
for vertex pairs that are not connected by any path). 

One way to quantify the success of a prediction method, 
used by previous authors who have studied link prediction 
problems 7 , is the ratio between the probability that the top- 
ranked pair is connected and the probability that a randomly 
chosen pair of vertices, which do not have an observed con- 
nection between them, are connected. Figure [S4] shows the 
average value of this ratio as a function of the percentage of 
the network shown to the algorithm, for each of our three net- 
works. Even when fully 50% of the network is missing, our 
method predicts missing connections about ten times better 
than chance for all three networks. In practical terms, this 
means that the amount of work required of the experimenter 
to discover a new connection is reduced by a factor of 10, an 
enormous improvement by any standard. If a greater fraction 
of the network is known, the accuracy becomes even greater, 
rising as high as 200 times better than chance when only a few 
connections are missing. 

We note, however, that using this ratio to judge prediction 
algorithms has an important disadvantage. Some missing con- 
nections are much easier to predict than others: for instance, 
if a network has a heavy-tailed degree distribution and we re- 
move a randomly chosen subset of the edges, the chances are 
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FIG. S4: Further comparison of link prediction algorithms. Data points represent the average ratio between the probability that the top- 
ranked pair of vertices is in fact connected and the corresponding probability for a randomly-chosen pair, as a function of the fraction of 
the connections known to the algorithm. For each network, (a, Terrorist associations; b, T. pallidum metabolites; and c, Grassland species 
interactions), we compare our method with simpler methods such as guessing that two vertices are connected if they share common neighbors, 
have a high degree product, or have a short path between them. 



excellent that two high-degree vertices will have a missing 
connection and such a connection can be easily predicted by 
simple heuristics such as those discussed above. The AUC 
statistic used in the text, by contrast, looks at an algorithm's 
overall ability to rank all the missing connections over nonex- 
istent ones, not just those that are easiest to predict. 

Finally, we have investigated the performance of each of the 
prediction algorithms on purely random (i.e., Erdos-Renyi) 
graphs. As expected, no method performs better than chance 

M. Molloy and B. Reed, "A critical point for random graphs with a given 
degree sequence." Random Structures and Algorithms 6, 161-179 (1995) 



in this case, since the connections are completely indepen- 
dent random events and there is no structure to discover. We 
also tested each algorithm on a graph with a power-law degree 
distribution generated according to the configuration model. 8 
In this case, guessing that high-degree vertices are likely to 
be connected performs quite well, whereas the method based 
on the hierarchical random graph performs poorly since these 
graphs have no hierarchical structure to discover. 



